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Abstract 



In the present paper we investigate methods related to both the Singular Spectrum Analysis 
(SSA) and subspace-based methods in signal processing. We describe common and specific features 
of these methods and consider different kinds of problems solved by them such as signal reconstruc- 
P^ ' tion, forecasting and parameter estimation. General recommendations on the choice of parameters 

to obtain minimal errors arc provided. Wc demonstrate that the optimal choice depends on the 
particular problem. For the basic model 'signal + residual' wc show that the error behavior depends 
on the type of residuals, deterministic or stochastic, and whether the noise is white or red. The 
structure of errors and the convergence rate are also discussed. The analysis is based on known 
theoretical results and extensive computer simulations. 

c/3 , Keywords: Singular Spectrum Analysis, time series analysis, subspace-based methods, signal 

processing, forecasting, linear recurrent formula, ESPRIT, frequency estimation. 
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■^ ; 1 Introduction 

l> 

^ I In the 1970-80s many papers describing methods based on the Singular Value Decomposition (SVD) of 

• ' a specially constructed Hankel matrix have been published, see [4, 6, 7, 11, 22, 25] among many others. 

Q ■ Many modern methods are based on the ideas proposed in these papers and these methods have proved 

O . to be very useful in many applied areas. The list of the methods includes HSVD (Hankel SVD, [3]) 

and HTLS (Hankel Total Least Squares, [35]) which are mostly used in the nuclear magnetic resonance 
spectroscopy, ESPRIT [30] used in the Direction-Of- Arrival problems, and Singular Spectrum Analysis 

^ , (SSA) which has been applied in many different areas. 

H I All these methods have several common features (up to using the same algorithms), but each of 

them also has some special features. The purpose of this paper is to show some commonalities and 
specifics of these methods and to analyze the optimal choice of parameters. Since SSA is not limited 
to any particular area of application, we mostly base our investigation on SSA. 

Let us briefly describe Basic SSA [15] for the real-valued case. Let us observe a one-dimensional 
time series F^r = {fo, ■ ■ ■ , In-i) of length N. We suppose that Fjy is a sum of several unknown 
but identifiable components and we are interested in some of them, for example, trend or regular 
oscillations. 

Embedding Given a window length L {1 < L < N) one proceeds with constructing K = N — L + 1 
lagged (L-lagged) vectors Xi = (/i-i, . . . , fi+L-.2)^ , I < i < K, and composing them into the matrix 
X = [Xi : . . . : Xk], which is called the L-trajectory matrix. Note that X is a Hankel matrix. 
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Decomposition The key step in SSA is the SVD of the trajectory matrix: 



d 

^ = TVku^v;^, (1) 



=1 



where Ai > . . . > A^ > are the positive eigenvalues of the matrix C = XX arranged in the 
nonincreasing order, {Ui, . . . , U^} is the orthonormal system of the corresponding eigenvectors of the 
matrix C {Ui are left singular vectors of X), and Vi = X.'^Ui/y/Xi, i = 1, . . . ,d, are the factor vectors 
(right singular vectors). Note that Ui, . . . ,Ud form an orthonormal basis of the column space C^ of 
X, which is called the trajectory space, Cd = span(C/i, . . . , Ud)- The triple (y/Xi, Ui, Vi) is called the 
i-th eigentriple (or ET) and y/Xi is called the i-th. singular value of X. 

Grouping After grouping the eigentriples by choosing a partition of the set {1, . . . ,d} onto m disjoint 
subsets Ii, . . . , Im, one obtains the decomposition 

x = x,, + ... + x,„, 

where X/ = ^ ^/XiUiV^ . Alternatively, we can consider the grouping step as a decomposition of the 

m 

trajectory space into the orthogonal sum of subspaces: i2rf = C^^>, where C^^> = span(L^j,i E Ik). 

k=l 

Reconstruction (diagonal averaging) One can reconstruct the components of the original series 
by the diagonal averaging of each component X/^, : 

Fn = F(i) + . . . + F^'") , 

where F^^' = (/q , . . . , /jv-i) ^-'^d /^ is the average value along the i-th secondary diagonal of X/^,. 

Although the described algorithm is designed to decompose the original time series into a sum of 
an arbitrary number of components, we will be considering the problem of decomposition into a sum 
of two time series (shortly, t.s.) components. The problem of decomposition into several components 
can be reduced to the sequential extraction of the components one by one (e.g., first by extracting the 
trend with a small window length and then by extracting the periodicity from the residual using a 
large window length, see [15]). In addition, a considerable number of methods of time series analysis 
(including ESPRIT examined in this paper) solve the problem of analyzing noisy signals and, in fact, 
consider only two t.s. components: signal and noise. 

Thus, suppose that we observe F/v = Sn + Rn, where Sn is the component of interest (e.g., signal) 
and Rjsf is the residual component. The residual i?Ar can be either a random noise or a deterministic 
component, or a mixture of both. 

The estimator Sn = F^^' of the time series Sn (so called reconstructed signal) can be considered 
as the main result of application of SSA. Appropriate reconstruction of the signal can be obtained in 
the case of its approximate separability from the residual. Typical examples of pairs of approximately 
separable t.s. components (see [15] for details) are trend (i.e., a slowly- varying component) and noise, 
trend and cyclic components, and two cyclic components with different frequencies. In general, we 
can talk about approximate separability of one of the following t.s. components: trend, cycle or noise, 
from a mix of the other ones. 

Suppose that the t.s. component Sn, which we are interested in, satisfies the following two 
properties. First, let Sjy be deterministic. Second, let Sn be leading, i.e. Sn generates the r leading 
SVD components in the decomposition (1) with indices from I = {1, . . . ,r}. In fact, we suppose that 
the minimal singular value of the trajectory matrix S of Sn is larger or asymptotically larger than 
the maximal singular value of the trajectory matrix R of Rn- 



The deterministic structure of Sn and its approximate separability imply that Sn can be approxi- 
mated by a time series of finite L-rank [15]. This means that in (1) all except the r leading eigenvalues 
are close to zero. In this paper we investigate the asymptotic properties as the time series length N 
tends to infinity. We consider an infinite time series F = (/o, . . . , fn, ■ ■ ■) and analyze the finite-length 
time series F^ consisting of its first N terms. For an infinite time series S, the following assertion is 
valid: S has finite rank r (its L-trajectory space Cr has dimension r for any L > r) iS S is governed 
by a linear recurrent formula (LRF) [15]. For finite-rank subseries S]\[ the SVD of L-trajectory matrix 
S has exactly r positive singular values and S is rank deficient. If Sjsf is interpreted as a signal, then 

(s) 

its trajectory space Cr is called the signal subspace. 

Note that if a time series is governed by an LRF, then it can be represented as a sum of prod- 
ucts of polynomials, exponentials and sinusoids, and in this case it makes sense to use a parametric 
setup for the problem. On the other hand, the SSA algorithm of signal reconstruction is essentially 
nonparametric and can be applied to a class of time series which is wider than the set of perturbed 
finite-rank signals. 

Statements of problems within the framework of SSA can differ in the following aspects. 

Al Features of interest: we can be interested either in the signal Sn as a whole or in some of its 
characteristics. In particular, if Sn has finite rank, then it has a known parametric form and 
we can be interested in parameter estimation. The most elaborated problem is the estimation 
of damping factors and frequencies of exponentially damped sinusoids in noise. For solving this 
estimation problem it is sufficient to know only r leading eigenvectors in (1), more specifically, 
the subspace spanned by these eigenvectors (i.e., the estimated subspace of the signal Sjy); see 
e.g. ESPRIT-like methods. 

A2 Type of residuals: the residual Rn is either deterministic or stochastic (or it has both random 
and deterministic components). These cases correspond to different properties of the SSA de- 
composition and cause different characteristics of estimators. For example, a finite-rank Sn will 
be the leading component if the deterministic residual is bounded by some constant, while white 
noise can have any variance for large time series lengths A^ and window lengths L. Also, the 
structure of the stochastic noise (e.g., white or red) can infiuence the behavior of estimation 
errors. 

A3 Choice of the window length: either we can vary the window length L or L is fixed. In the former 
case, the problem of the optimal choice of L arises. Then, the asymptotic behavior depends on 
whether L tends to infinity as A^ — )• oo or not. If we consider a matrix L x K with a fixed 
number of rows as an input, then the only way is to fix L and to consider K tending to infinity. 
The other possible reason for choosing a not very large L is the computational cost. However, 
there are recent computational advances, which make calculations very fast, see 



In this paper, we consider different statements of the problem within Aspect Al and then analyze 
errors and parameter choice rules following Aspects A2 and A3. 

The main information about the time series structure that we obtain within the framework of SSA 
is contained in the set of eigentriples {y/Xi, Ui,Vi). Consequently we obtain not only the reconstructed 
signal but also much additional information about S^. In addition to the problem of reconstructing 
Sn, we consider the problems of signal forecasting and signal parameter estimation. 

Note that the SVD is determined only by the set of eigenvectors C/j, since Aj = ||X^C/j|p and 
Vi = 'SJUi/y/Xi, where X is the trajectory matrix of the observed time series. Consequently, S = 
^[^^ UiU^'K = PrX, where Pr is the orthogonal projection on Cr- Therefore, we can say that the 
set of eigenvectors (along with the time series F]\r) completely determines the whole SVD expansion 



and, therefore, the results of forecasting and parameter estimation. Thus it is natural to start the 
investigation with the estimation of eigenvectors or, equivalently, the eigenspace estimation. Note 
that the problem of estimation of the factor vectors Vi becomes the problem of estimation of the 
eigenvectors Ui by changing the window length from L to A^ — L + 1 . 

Let us remark that the transformations applied to the eigenvectors in order to obtain, for example, 
the frequency estimates can tremendously change the structure of estimation errors. Nevertheless, in 
Section 2 we consider the errors of signal subspace estimation as the starting point of the investigation. 
Sections 3 and 4 contain the results on reconstruction and forecasting based on the chosen subspace. 
Section 5 is devoted to parameter estimation within the framework of the subspace-based methods of 
signal processing including ESPRIT. In Sections 2-5 we give general recommendations on the choice 
of the window length L which are based on simulations and known theoretical results. In Section 6 
we consider the convergence rate in different conditions (a fixed window length L or a window length 
proportional to the time series length) for the problems investigated in previous sections. 

Let us remark that the results of Sections 2-6 are valid under the condition of strong separability 
of the signal from the residual. Section 7 deals with several examples, in which there is no strong 
separability. In Section 8 we consider several versions of Basic SSA and demonstrate some examples 
of application of the versions designed for the stationary time series to non-stationary ones. 

In Section 9 we briefly describe some origins of the SVD providing the key step of the SSA 
algorithm. We are interested in these origins, since they imply different views on the problem statement 
and on the parameter choice. 

2 Signal subspace 

We will generally rely on the results of the paper [27], which is devoted to the discussion of convergence 
and also contains the main error terms and their upper bounds. 

As a measure of the error for the subspace approximation we consider the spectral norm of the 
difference of projectors on the true subspace and the estimated subspace. Note that this norm is 
equal to the sine of the largest principal angle between these subspaces. The aim of this section is to 
investigate the dependence of the approximation error on the window length. 

(s) 

Let Sm be a signal of rank r. By P^ we denote the orthogonal projector on the signal subspace 
Cr , which is spanned by the left singular vectors U^ , . . . , Ur of the signal trajectory matrix S, and 
by Pr we denote the orthogonal projector on the estimated signal subspace Cr = span(?7i, . . . , Ur), 
where Ui, . . . ,Ur are the r leading left singular vectors of the trajectory matrix of the observed time 

II (s) 

series F]sr. Note that we can easily calculate the estimation error ||Pr — Pr||, since the cosine of the 
largest principal angle between Cr and Cr is equal to the r-th eigenvalue of the matrix \Jr \jj, where 
Ul'^ = [C/f ^ : . . . : C/i'^] and Ur = [Ui : . . . : Ur] (see e.g. [5, p. 18]). 

Let us consider five examples of time series fn = Sn + rn, n = I, . . . , N, 

Sr, = 1, r„ = -c(-l)", (2) 

s„ = 6"cos(27rn/10), r„ = c, (3) 

Sn = 6"'cos(27rn/10), r„ = aSn, (4) 

6" cos(2^n/10), rn = (ae„ + c)/V2, (5) 



s 



s„ = 6"cos(27rn/10), r„ = (jr/„. (6) 

Here £n is a white gaussian noise with variance 1 and r/„ is the autoregressive process of order 1 (red 
noise) with parameter a and variance D?7„ = 1, that is, ??„ = ctVn-i + ^n, where e^ has variance 1 — a^. 
In this section, we set c = a = 0.1, a = 0.5, b = 1, and N = 100. 



We choose the level of noise in the time series (3)"(6) to have the same signal-to-noise ratio (SNR), 
which is conventionally determined as the ratio of the average of squared signal values to the average 
of squared residual values (or to the variance of residuals if they are random) . 

Generally, the SNR does not determine the size of the errors of estimates obtained by the SSA 
processing, since the SNR does not take into consideration the time series length. In fact, the SNR 
can be used to compare the quality of processing of time series of equal lengths. However, we cannot 
say that SSA separates signal and noise only if the SNR is larger than a specific value; for example, 
for any small SNR a sine-wave signal is asymptotically separated from a white noise as iV — >• oo and 
L ~ /3iV, < ^ < 1. 

The time series (2) is included, since we can compare the results with those in [27, Section 4.2.1]. It 
appears that the main term of perturbation found in [27] is almost equal to the whole error. Moreover, 
the behavior of errors depends on whether the lengths of the window and the time series are even 
or odd. In addition, this is an example of a time series that produces the projector error having the 
first-order term with respect to the perturbation level which is not the main term of the error as 
iV-> oo. 

The time series (3)-(5) differ in the structure of residuals: deterministic, random, or combined. 
The time series (6) is used to consider a noise that differs from the white noise. 

In the case of random residuals, we compute either MSD (mean square deviation) or RMSE (square 
root of mean square error) as a measure of accuracy. Generally, these criteria yield very similar results. 
The difference between them is that we can compute the square root before averaging (MSD) or after 
averaging (RMSE) for the simulation results. In the examples of this section we estimate MSD using 
100 simulations. 

We present the results of simulation study for the time series (2)-(6) in Figures 1-7. Figures 1 

(s) 

and 2 show the errors in estimation of the projector P^ on the signal subspace for the examples 
with deterministic residuals. One can see a tendency for the errors to increase as the window length 
increases. However, for window lengths L that are divisible by periods of the time series components, 
errors generally decrease. This reflects the influence of the multiplicity of L and/or K to the periods 
of time series components. Note that if both L and K are divisible by the periods (by 2 in the first 
example and by 10 in the second one), then the projector perturbation is equal to 0. This corresponds 
to the case of bi-orthogonality of the trajectory matrices of Sn and R^ and therefore to the case of 
exact separability. If only L (or K) is divisible by the period, then this case can be called left (or 
right) orthogonality. Thus, if the residual is deterministic and Si\f contains a periodic component, 
then we observe two effects: the specific behavior of errors in the case of window lengths divisible by 
the period and a periodic behavior of errors in the general case. 

It is clear that if the residual Rn contains noise, then the exact orthogonality cannot be achieved. 
Figures 3 (the logarithmic scale) and 4 (the original scale) demonstrate that the decrease of errors for 
special window lengths does not occur. 

In the case of combined residuals (Fig. 5), the behavior of projector errors inherits the properties 
of errors for both pure random and deterministic residuals. Below we show that this feature is valid 
for other kinds of problems. 

To show that the fact that the noise is red (rather than white) does not interfere with the extraction 
of the signal, we consider the time series (6) with a red noise, see Fig. 6. Fig. 7 compares MSD for 
different structures of residuals (recall that SNR is the same) . One can see that a red noise yields a 
slightly worse accuracy. Errors for the time series (5) lie between those for the time series (3) and the 
time series (4). 

To summarize, in this section we demonstrated the behavior of errors of projector estimates for 
different types of residuals. However, we are usually interested in certain features of the signal space, 
rather than in the projector itself. Therefore, the results of this section provide just the basic infor- 



mation, which can be used for explanation of further results. 
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Figure 1: MSD of projector estimates: deterministic residuals, t.s. (2) (log-scale) 
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Figure 2: MSD of projector estimates: deterministic residuals, t.s. (3) (log-scale) 




Figure 3: MSD of projector estimates: white-noise residuals, t.s. (4) (log-scale) 




Figure 4: MSD of projector estimates: white-noise residuals, t.s. (4) (original scale) 
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Figure 5: MSD of projector estimates: mixed residuals, t.s. (5) (log-scale) 
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Figure 6: MSD of projector estimates: red-noise residuals, t.s. (6) (log-scale) 
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Figure 7: MSD of projector estimates: different types of residuals, t.s. (3)-(6) 

3 Signal extraction 

Recall that the reconstructed signal is obtained by applying the diagonal averaging to the reconstructed 
matrix calculated by the formula S = Uj.Ar Vj, where Ur = [Ui : . . . : Ur], Vr = [Vi : . . . : Vr], 
and Ar = diag(Ai, . . . , A^); Ui, Vi and Aj are defined in (1). Note that the columns of the matrix Ur 
form a basis of the perturbed signal subspace for the window length L while the columns of V^ form 
a basis of the perturbed signal subspace for the window length A^ — L + 1. First, this means that the 
results are the same for the window lengths L and A^ — L + 1. Then, as is shown in Figures 1-7, the 
signal-subspace perturbation grows as L increases and decreases as N — L + 1 increases. Thus, the 
resultant errors are caused by these contradictory tendencies. Fig. 4 demonstrates that the growth 
rate of errors in the projector estimation is larger for large L > N/2. Therefore, there is no surprise 
that the reconstruction errors are large for small window lengths due to large errors in Vj.. However, 
the question about the optimal window lengths remains open. 

Further we consider the dependence of the reconstruction error s; — si on the window length in 
several examples. Note that an explicit asymptotic form is known for the example of noisy constant 
signal with Sn = c [17]. The paper [17] contains an explicit expression for the variance of the first-order 
reconstruction errors, where the first order is given with respect to the perturbation of the signal by 
Rn and not with respect to N. Strictly speaking, it is not necessary for the first-order error to be 
the main term of the error as A^ — t- oo. It has been checked by computer simulations that in the case 
of pure random noise we can consider the first-order error as the main error term (it is not true in 
the general case, see [27]). Computer simulations confirm that the qualitative results for the constant 
signal are valid for many other types of signals as well, including oscillations. To describe these results 
let us present the formula for the dependence of the asymptotic errors on the window length for a 
constant signal [17]. 

Let the window length L r^ f3N, < /3 < 1/2, and I be the index of the time series point, I ~ ^N/2, 
0<7< 1, asA^— T-oo. The value 7 = 1 corresponds to the middle of the time series; consequently, we 
present the formula for the first half of the time series with window lengths smaller than one-half of 
the time series length. Then the variance of the first-order errors has the following asymptotic form: 



■l?i(/3,7), 0<7<2min(/3, 1-2/3), 

■ 2/3) < 7 < 2/3, 
D3{(3,-f), 2/3<7<l, 



sp^~^^Z^2(/3,7), 2min(/3,l 



(7) 



as A^ — 7- oo, where 



^i(/3,7) 



^2(/3,7) 



12/32(1-^)2 
+ 4/3(3-3/3 + 2/32 
1 



(7'(l + /3)-27/3(l + /3)^ 



7^ + 27^ (3/3 - 2 - 3/3^) 



^3(/3,7) 



6/32(1-/3)^72 
+ 272 (3 - 9/3 + 12/32 - 4/3^) 
+ 47 (-1 + 4/3 -3/32 -4/3^ + 4/3^) 

+ (8/3-56/32 + 144/3^-160/3^ + 64/3^)), 

2 

3^' 



The points of change between the cases in (7) correspond to I = K — L (i.e., 7 = 2(1 — 2/3)) and / = L 
(7 = 2/3). The former point of change is present if ET < 2L (/3 > 1/3). Note that formula (7) can be 
extended to the window lengths 2<L<A^ — 1(0</3<1) and to the indices of time series points 
0<l<A^ — 1(0<7<2) due to the symmetry of errors with respect to the middle of the time series 
and by the equivalence of results under the substitution of L for iT (/3 -H- 1 — /3). 

When we solve the problem of minimizing RMSE of estimation of s/ at a fixed point I, the optimal 
window length varies from iV/3 to N/2, see [17]. This means that even in the case with a constant 
signal the optimal window length, which minimizes the reconstruction errors as a whole, depends on 
the importance (weights) of each point of the time series. In any case, the general recommendation 
is to choose the window length slightly less than one-half of the time series length N. Note that the 
optimal window length provides a considerable improvement in the error rate (with respect to the 
choice L = N/2) at the edge time series points, that is, for l/N ~ 0. 

It has been shown for the projectors in Section 2 that for a noisy sine-wave signal (i.e. if the 
residuals do not contain a deterministic component), the divisibility of the window length by the 
sine-wave period is not an important issue. The presence of a deterministic component in the residual 
makes this divisibility important. A similar effect takes place for the reconstruction errors ||57v — S'atH, 
see Fig. 8 for the time series (3)-(6) with the same parameters. 
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Figure 8: RMSE of signal estimates: different types of residuals, t.s. (3)-(6) 
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Figure 9: RMSE of signal estimates: mixed residuals, t.s. (5) 
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Figure 10: RMSE of signal estimates for the last 10 points: mixed residuals, t.s. (5) 



To study the influence of the window length on RMSE, we consider RMSE for the reconstruction 
of the ten last points of the signal. Comparison of Fig. 9 and Fig. 10 shows that the impact of the 
divisibility of the window length by the sine-wave period is stronger for the edge points. 

In Fig. 8 we observe that the optimal window length is close to 0.4A^ in the case of random residual. 
However, the divisibility of the window length by the period (if the residuals contain a deterministic 
component) can be more important than the adjustment, e.g., than the transition from N/2 to 0.4A'^. 

4 Recurrent SSA forecast 

4.1 Theory 

Let us consider the algorithm of recurrent forecasting [15] from the viewpoint of its connection to the 
signal subspace estimation. We start with some definitions. 

Definition 4.1. A time series Sn = {silj^g is governed by a linear recurrent formula (LRF), if 
there exist ai, . . . ,at such that 



Si+t = ^ akSi+t-k, 0<i < N -t, at / 0. 

fe=i 



(8) 
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The number t is called the order of the LRF, ai, . . . ,at are the coefficients of the LRF. If t is the 
minimal order of an LRF that governs the time series Sn, then the corresponding LRF is called 
minimal. 

Note that if the minimal LRF governing the signal Sn has order r, then Sn has finite rank r. 

t 
Definition 4.2. A polynomial Ptilj) = /f^* ~ X] '^fc/^*~ is called a characteristic polynomial of the 

fc=i 
LRF (8). 

Let the time series 5 = (sq, . . . , s^, • • •) satisfy the LRF (8) with at 7^ and i > 0. Consider the 
characteristic polynomial of the LRF (8) and denote the multiplicities of its (in general, complex) 
different roots fii, . . . , fip by km, where 1 < m < p, ki + . . . + kp = t. Note that the polynomial roots 
are non-zero as at ^ 0. Then the following well-known result (see e.g. [15, 18]) provides an explicit 
form for the series which satisfies the LRF. 

Theorem 4.1. The time series S = (sq, . . . , Sn, ■ ■ ■) satisfies the LRF (8) for all i > iff 

p /fcm-i \ 

m=l \ j=o / 

where the complex coefficients Cmj depend on the first t points sq, . . . , st-i- 

Note that if the LRF is not minimal, then the corresponding characteristic polynomial has extra- 
neous roots. The extraneous roots do not affect the time series behavior, since the coefficients Cmj for 
the corresponding summands are equal to zero. However, if one applies the LRF to the perturbed ini- 
tial terms sq, • • • ,st-i, then the extraneous roots start to affect the forecasting results. Therefore, the 
extraneous roots with modules greater than 1 are the most hazardous, since the extraneous summand 
/i", caused by an extraneous root ^, |/u| > 1, grows to infinity. 

Unfortunately, if one analyzes/forecasts a real-life time series, the minimal LRF cannot be esti- 
mated with an appropriate accuracy and hence the presence of extraneous roots should be taken into 
account. The estimated LRF can be used both to find a parametric form of the signal (then we should 
remove extraneous roots) and also to forecast the time series (then we do not need to know the values 
of the polynomial roots; however, we would like to have no extraneous roots beyond the unit circle). 

Finding the LRF Let Pi ... P^ be an orthonormal basis of the signal subspace Cr = spanjPi . . . Pr} 
and C^^i I = span{Pr+i • • • Pl} be its orthogonal complement. Denote A = {ai-i, . . . , ai, —1)^ G 
^r+i Li flL-i 7^ 0. Then the time series satisfies the LRF Sn = '^k=i o-k^n-k, n = L — 1, . . . ,N — 1. 
Conversely, if a time series is governed by an LRF, then the LRF's coefficients B = {ol-i, . . . , ai)""" 

B \ As) 



complemented with — f yield the vector I I G ^r+i L- ^^^ LRF that governs the time series can 

is) 

be treated as a forward linear prediction. In addition, if we consider a vector from C],!^ ^ with —I as 
the first coordinate, then we obtain the so-called backward linear prediction [33]. 

Let us denote the matrix A without the last row by A and the matrix A without the first row by 
A. 

From the viewpoint of prediction, the LRF governing a time series of rank r has coefficients 
derived from the condition S B = (sl_i, . . . , SAr_i)'^. This system of linear equations may have 
several solutions, since the vector {sl-i, ■ ■ ■ , sjy-i) belongs to the column space of the matrix S . It 
is well-known that the least-squares solution expressed by the pseudo-inverse to S yields the vector 
B with the minimum norm (the TLS-solution coincides with it). 
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It can be shown that the mmhnum-norm solution i?LS can be expressed as 

1 "" 

Bi,s = iaL-i,---,aif = - aVvTiP,, (10) 

where ttj is the last coordinates of Pi and v^ = ^l^i vr?. 

Thus, one of the vectors from C^^i ^, which equals tIls = I ^ ) i has a special significance and 

is called the min-norm (forward) prediction. Similarly, we can obtain a formula for the min-norm 
backward prediction. 

It is shown in [15, 23] that the forward min-norm prediction vector ^ls is the normalized (so that 
its last coordinate is equal to —1) projection of e^ = (Ol_i, 1)^ on the orthogonal complement to the 
signal subspace. Therefore, the min-norm prediction vector depends on the signal subspace only. 

The paper [24] contains a property of the min-norm LRF, which is very important for forecasting: 
all the extraneous roots of the min-norm LRF lie inside the unit circle on the complex plane. This 
gives us the hope that in the case of real-life time series (when the min-norm LRF and the related 
initial data are perturbed) the extraneous summands in (9) decrease and just slightly influence the 
forecast. Moreover, in view of results about the distribution of extraneous roots (see [28, 34]), we can 
suppose that the extraneous summands are able to compensate one another. 

Note that this min-norm LRF forms the basis for the forecasting methods introduced in [15]. In 
particular, the recurrent forecast is constructed by applying formula (10) with Pi = Ui, where Ui are 
taken from (1), to the last L — 1 reconstructed signal points ?7v-{L-i)) • • • ,'Sn~i- 

4.2 Dependence on the window length 

Let us consider the dependence of the forecast accuracy on the window length L (we consider the 
signal rank r as given in advance). Note that the forecasting procedure uses two objects: the LRF 
itself and the initial data for this LRF taken from the last points of the reconstructed signal Sn. Let 
us denote the vector constructed from the L — 1 last signal points by V and the vector constructed 
from the L — 1 last reconstructed (i.e., perturbed) signal points by 1/ + AV. Likewise, we denote the 
vector of coefficients of the true min-norm LRF by A and the vector of the estimated LRF coefficients 
hy A + AA. Then the forecast error is A^AV + {AA)^V + {AA)'^AV. 
Therefore, the first-order error consists of two kinds of errors: 

1. the errors in the LRF coefficients that are caused by an error of the projection onto the signal 
subspace {AA)^V; 

2. the errors of signal reconstruction A AV. 

Let us investigate these two error sources separately. To do this, we apply the LRF that was 
estimated with the window length Llrf to the true signal values and apply the true LRF to the 
estimated signal values that were reconstructed with the window length Lj-^c. 

We consider the time series (4) with a = 0.1, A^ = 399 and ln6 = 0, 0.01, —0.01. Estimation (by 
1000 simulations) of RMSE of the one-term ahead forecast is depicted in Fig. 11 for Ljcc = 200. Values 
of Llrf are varied from 20 to 380 with increment 20. Similar graphs for different Lrec are presented 
in Fig. 12. The labels of x-axis correspond to the values of Lfcc- 

The line marked 'total' shows the accuracy (RMSE) of the forecast with the corresponding window 
lengths. The line marked 'LRF' corresponds to (AA) V. The 'LRF' errors look like the errors of 
the estimator P^ of the projector on the signal space (see Fig. 4). This is not surprising since the 
coefficients of the LRF are proportional to the projection Vr+i^L^J+i l^l = Pr^L- Naturally, the 
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'LRF' errors do not depend on the window length L^^c used for reconstruction. The typical behavior 
of this part of the error is as follows: the larger the window length, the larger the error. 



-Total T»^Rec ^ LRF 




400 



Figure 11: RMSE of forecast as a function of Llrf^ t-s. (4) with 6 = 1, Lj. 
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Figure 12: RMSE of forecast: t.s. (4) with b = l,different L^ 



The line marked 'Rec' corresponds to A^'^Al/. We can see that the larger the window length, the 
smaller the error. This can be interpreted in the following way: extraneous roots that are located 
closely to the uniform distribution on a circle compensate one another (see [28, 34] for several results 
of this kind). 

Figures 11-12 show that the accuracy of forecasts is stable within a wide range of window lengths. 
In particular, Llrf and Lrec slightly smaller than N/2 are quite appropriate. Also, we can take either 
small Llrf and Lrec ~ N/2 or Llrf ~ N/2 and small Lrec- The former is the preferred choice, since 
the errors are smaller and more robust to the changes in the window length, see Fig. 12. 

Recommendations on the window length choice naturally depend on the forms of the signal and 
the residual. Figures 13 and 14 contain RMSE for damped sine waves. The interpretation of figures 
and the RMSE behavior are similar. 



The deterministic residual can provide a specific behavior of errors. Specifically, for the time 
series (2) we obtain that the choice of an even N and odd values of -Llrf provides a decrease of the 
errors of projectors as the window length increases (see Fig. 1). Therefore, for such choice of L and 
N we observe similar behavior for two sources of forecast errors and the optimal choice is that the 
window length should be as large as possible. 
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Figure 13: RMSE of forecast: t.s. (4) with b < l,different L^ 
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Figure 14: RMSE of forecast: t.s. (4) with b > l,different Lrec 

5 Subspace-based methods of parameter estimation 

While the problems of reconstruction and forecasting are traditionally included into the scope of 
problems solved by SSA, estimation of signal parameters by the subspace-based methods is often not 
considered within the framework of SSA. Therefore, we will describe the subspace-based methods in 
more detail to demonstrate their cohesion with SSA. 

5.1 Basic facts 

Definition 5.1. The companion matrix of a polynomial p{fJ-) = /i" + ci/i""^ + . . . + Cn-iH + c^ is 

/ ... -Cn \ 

10 ... -Cn-1 

1 ... -Cn-2 



\ ... 1 -Ci 



/ 



Proposition 5.1. The roots of a polynomial coincide with the eigenvalues of its companion matrix. 

Note that the multiplicities of roots are equal to the algebraic multiplicities of the eigenvalues 
of the companion matrix (i.e., to the multiplicities of roots of the matrix characteristic polynomial). 
However, these multiplicities do not always coincide with the geometric multiplicities equal to the 
dimensions of the eigenspaces corresponding to the eigenvalues. 

To derive the analytic form (9) of the signal we need to find roots of the characteristic polynomial 
of the LRF which governs the signal. By Proposition 5.1, we have to find either the roots of the 
characteristic polynomial or the eigenvalues of its companion matrix. The latter does not require a 
linear recurrent formula itself. Let us demonstrate that to find the signal roots it is sufficient to know 
the basis of the signal trajectory space. 

Let C be a full-rank d x d matrix, X G W^, and X be a full-rank L x d matrix, L > d, which can 
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be expressed as 



X 



( X^ 



(11) 



Let us again denote the matrix X without the last row by X and the matrix X without its first 
row by X. It is clear that X = XC. We call this property of X shift property given by the matrix C. 

Proposition 5.2. Let X satisfy the shift property given by the matrix C, P be a full-rank dxd matrix, 
and Y = XP. Then the matrix Y satisfies the shift property given by the matrix D = P^^CP, i.e., 
Y = YD. 



Proof. 



XP 



/ X^P 
X^CP 



/ X^F 
X^FB 



( X^F 

XTp(p-iCP) 



\ 



V xTp(p- 



ic^-ip) J 

\ 



\ X^PD^"^ / 



V y^Di-i I 



where Y = X F. This implies the shift property for Y. ■ 

Note that multiplication by a nonsingular matrix P can be considered as a change of coordinates 

in the column space of the matrix X. 

It is easily seen that the matrices C and D = P^-'^CP have the same eigenvalues; these matrices 

are called similar. 

Remark 5.1. Let the matrix Y satisfy the shift property given by the matrix D. Then D = Y' Y, 
where A^ denotes the Moore-Penrose pseudoinverse of A. 

Proposition 5.3. Let a time series F satisfy the minimal LRF (8) of order d, L > d be the window 
length, C be the companion matrix of the characteristic polynomial of this LRF. Then any L x d 
matrix Y with columns forming a basis of the trajectory space of F satisfies the shift property given 
by some matrix D. Moreover, the eigenvalues of this shift matrix D coincide with the eigenvalues of 
the companion matrix C and, therefore, with the roots of the characteristic polynomial of the LRF. 

Proof. Note that for any 0<i<iV — Iwe have 

(/i) fi+1, • • • 5 /i+(d-l))C = {fi+1, fi+2, ■ ■ ■ , fi+d)- 

Therefore, (11) holds for X = (/o,/i, • • • ,/d-i) • It is known that for a time series governed by a 
minimal LRF of order d, any d adjacent vectors of the embedding are independent. Consequently, the 
matrix X is of full rank and we can apply Proposition 5.2. ■ 

Remark 5.2. The SVD of the L-trajectory matrix of a time series provides a basis of its trajectory 
space. Namely, the left singular vectors which correspond to the nonzero singular values form such a 
basis. If we observe a time series 'signal + residual', then the SVD of its L-trajectory matrix provides 
the basis of the signal subspace under the condition of exact strong separability of the signal and the 
residual, see [151. 
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5.2 ESPRIT 

Consider a time series Fjy = {/i}j=o ^ ft — ^i~^ '^ii where Sn = {sjj^^g is a time series governed by a 
linear recurrent formula of order r (i.e. signal), i?Ar = {ri}-J^ is a residual (noise, perturbation). Let 
again X be the trajectory matrix of F/v- In the case of exact or approximate separability of the signal 
and the residual and if the signal is dominant, the subspace span{C/i, . . . , Ur} can be considered as an 
estimate of the true signal subspace. Therefore, we can use Y = Ur = [Ui : . . . : Ur] as an estimate of 
Y from Proposition 5.3. Then the shift property is fulfilled only approximately and U^D ~ U^. 

Let us study the methods of finding the matrix D. The idea was introduced in [25] devoted to the 
problem of estimating the frequencies in a sum of sinusoids, in the presence of noise. The method was 
given its well-known name ESPRIT in [30], which was later used in many other papers devoted to the 
DOA (Direction of Arrival) problem. This method is also called Hankel SVD (HSVD [3], for the LS 
version) and Hankel Total Least Squares (HTLS ["'"], for the TLS version). 

5.2.1 Least Squares (LS-ESPRIT), HSVD 

The LS-ESPRIT estimate of the matrix D is 

D = U^^lJ; = (U^^U^)-^U^^U;. (12) 

The eigenvalues of D do not depend on the choice of the basis of the subspace span{C/i, . . . , Ur}- 
In fact, if we change the coordinates so that W = U^P, where P is a nondegenerate r x r transfer 
matrix, then W^ = U^P and W = U^P. Hence, by the direct substitution we have that the estimators 
of D obtained by using W or Ur are similar matrices and therefore have the same eigenvalues. 

5.2.2 Total Least Squares (TLS-ESPRIT), HTLS 

Since U^ is known only approximately, then there are errors in both U^ and U^. Therefore, the 
solution of the approximate equality U^D w U^ based on the Total Least Squares method can be 
more accurate. 

Let us recall that to solve the equation AX ~ B TLS minimizes the following sum: 

||A- A|||-f ||B -B||| — > min, where (13) 

(A,B) e {(A,B) : 3Z, AZ = B}. 

Set A = Ur, B = Ur in (13). Then the matrix Z that minimizes (13) is called the TLS-estimate 
of D (see [8] for explicit formulas) . 

Let us consider the dependence of the TLS-ESPRIT solution on the choice of the basis of span{C/i, . . . , Ur}. 
Generally speaking (computer experiments confirm this), this dependence takes place. However, the 
following assertion can be proved. 

Proposition 5.4. The TLS-ESPRIT solution does not depend on the unitary transformation of the 
basis o/span{t^i . . . Ur}- In particular, the TLS-ESPRIT estimate is the same for any orthonormal 
basis. 

Indeed, suppose that the minimum in (13) is achieved at some Z. Let us consider the revised 
problem (13) with replacement of A and B by AP and BP, respectively. It is easy to see that if P is 
an r X r unitary matrix (and therefore preserves norms), then the solution of the revised problem (13) 
has the form of P~^ZP, which is similar to Z. Thus, the eigenvalues of the solution of the problem 
(13) do not depend on simultaneous rotations/reflections in the column spaces of the matrices A and 
B. 
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5.2.3 DOA and time series analysis 

Recall that ESPRIT-like methods can be applied both to the general problem of parameter estimation 
for the time series and to the DOA problem specifically. However, there are several special aspects of 
application to DOA: 

1. Input data is in the form of an L x ii' matrix (an analogue to the trajectory matrix). The matrix 
satisfies the shift property but it is not necessarily a Hankel matrix. 

2. The number of sensors (L) is fixed and is usually not large. Therefore, the properties of the 
method are considered under the condition that L is fixed and if — t- oo. 

3. The problem can be stated so that the entries of the input matrix are noisy with independent 
noise realizations (this does not hold for the trajectory matrices where both the signal and noise 
form Hankel matrices). 

4. The data for DOA is mostly complex-valued with complex circular white Gaussian noise. 

Generally, these aspects can influence the statement of the parameter choice problem and the rule 
of the optimal parameter choice. 

5.2.4 Dependence on the window length 

The paper [2] contains the following theoretical result: while estimating the frequency of a noisy 
sinusoid, the asymptotic (A — )• oo) variance of the first-order error has order 1/{K'^L) and is symmetric 
with respect to N/2. Therefore, the asymptotic optimal window length is equal to A/3 or 2N/3. 
Numerical experiments confirm this conclusion. Let us remark that it is not always true that the first- 
order (with respect to the perturbation level) error is the main-term error as the time series length 
tends to infinity. Therefore, it is better to check the correspondence between the first-order error and 
the total error through simulation. 

In [9] an explicit form of the asymptotic variance of the first-order error is derived in the general 
case of damped complex exponentials. In the case of undamped complex exponentials, the derived 
form coincides with that in [2]. As for damped complex exponentials, the result is that the optimal 
window length lies between A/3 and N/2 and approaches A/2 as the damping factor increases. It is 
shown in [9] that for Sn = exp((a-|-i/3)n), i = \/— T, the first-order variances of the ESPRIT estimates 
of Q and f3 are equal. Therefore, the optimal window lengths are the same for estimators of the 
damping factor a and of the frequency /3 . 

In the previous sections we demonstrated that the separability of the signal from deterministic 
and stochastic residuals has a different nature and therefore leads to different consequences. Let us 
consider how this difference reveals itself in the problem of the frequency estimation by ESPRIT. We 
perform simulations for the time series (3)-(6) with = 0" = 0.1, 6 = 1, A = 100. 

Fig. 15 contains the results for the deterministic perturbation, including the specific behavior of 
RMSE in the case of one-sided (left) orthogonality. Fig. 16 contains RMSE of frequency estimates 
for different kinds of residuals. The behavior of errors is very similar to that in the signal reconstruc- 
tion, see Fig. 8. Also, the errors of frequency and exponential rate (damping factor) estimates are 
approximately equal if L is proportional to A. The main difference from the reconstruction errors is 
in the size of errors, which is much smaller. Therefore we use 1000 instead of 100 simulated series to 
estimate RMSE with sufficient accuracy. 

Fig. 17 focuses attention on the error behavior for small window lengths. One can see that the 
dominance of errors for the time series (3) with deterministic residuals is a distinctive feature. 
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Figure 15: RMSE of frequency and exp.rate estimates: t.s. (3) (log-scale) 
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Figure 16: RMSE of frequency estimates: different types of residuals, t.s. (3)-(6), L ~ N/2 
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Figure 17: RMSE of frequency estimates: different types of residuals, t.s. (3)-(6), L is small 

5.3 Brief review of other subspace-based methods 

In this subsection, we briefly describe several subspace-based methods in addition to ESPRIT-like 
ones. These methods are applied to time series governed by LRFs and in fact estimate the main 
(signal) roots of the corresponding characteristic polynomials. The basic subspace-based methods 
were developed for the case of a noisy sum of imaginary exponentials (cisoids) or of real sinusoids, for 
the purpose of frequency estimation, see e.g. [32]. We are mostly interested in the methods that can 
be applied to any time series of finite rank given in the form (9). 

We start with the description of general methods in the complex-valued case. 

Version 1 Consider an LRF that governs the signal (the best choice is the min-norm LRF, see Section 
4. 1 , however this is not essential) . Then we find all roots fim of the characteristic polynomial of this 
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LRF and then find coefficients Cmj in (9). The coefficients Cmj corresponding to the extraneous roots 
are equal to 0. In the case of a noisy signal, Jim are the roots of a polynomial with coefficients from 
a vector that belongs to C^ , and the extraneous roots have small absolute values of the LS estimates 

Version 2 Let us consider the forward and backward min-norm predictions. It is known that the 
corresponding characteristic polynomials have the conjugate extraneous roots and their signal roots 
are connected by the relation z' = z* /WzW^. Note that the forward prediction given by a vector A G C^ 
corresponds to the roots of {Z{z),A) = 0, where Z{z) = (1, z, . . . , z^"^)"^ and (• , •) is the inner product 
in C. At the same time, the backward prediction given by a vector B G C^jr corresponds to the roots 
of {Z{l/z),B) = 0. If we consider the roots of the forward and backward min-norm polynomials 
together, then all the extraneous roots lie inside the unit circle, while one of z' and z is located on or 
beyond the unit circle. This allows us to detect the signal roots. 

Version 3 Let us take a set of vectors from £;^ . Each vector from £^ with a nonzero last coordinate 
generates an LRF. The signal roots of the characteristic polynomials of these LRFs are equal (or 
are close for a noisy signal), whereas the extraneous roots are arbitrary. Therefore, the signal roots 
correspond to clusters of roots if we consider pooled roots. One of the ways to choose vectors from 
£,;i- is to take the set of eigenvectors corresponding to noise. 

There are some other methods that are developed for estimating frequencies in a noisy sum of 
undamped sinusoids or complex exponentials. Let for simplicity s„ = Ylk=i ^k^'^^^'^''^ • ^^ *^^^ case, 
the signal roots e^'^*'^'' have absolute values equal to 1 and can be parameterized only by one parameter 
(frequency). Let W = W{uj) = ^(e^™). Since W{ujk) G ci'\ {W{uJk),A) = for aU A G 4ti,L- ^^ 
A G i2^, then we can consider the square of the cosine of the angle between W{ujk) and A as a measure 
of their orthogonality. This idea forms the basis for the Min-Norm and MUSIC methods. The names 
of methods in which roots are ordered by the absolute value of the deviation of their modules from 
the unit circle begin with 'root-'. 

Version 4. Min-Norm Let /(w) = cos'^ {W (uj) , A) , where A = P^^l S £^ is the vector corre- 
sponding to the min-norm forward prediction. The Min-Norm method consists in searching for the 
maximums of l//(w), and the function l/f{uj) of uj is interpreted as a so-called pseudospectrum with 
peaks at the frequencies presented in the signal. 

Version 5. Root Min-Norm We consider the min-norm LRF and choose the r closest to the unit 
circle roots of its characteristic polynomial. 

Version 6. MUSIC Let /(a;) = cos'^ {W (u) , C:^) . If we take eigenvectors Uj, j = r + I, . . . ,L, 
as a basis of £^, then Ur+i,LU*_,_2 ^ provides the matrix of projection on £^ and therefore /(w) = 

W*{\Jr+i,LV;_^i^L)W/\\W\\^ = Y^ fjiio), where fj{uj) = cos^{W{uj), Uj). Thus, the MUSIC method 

j=r+l 

can be considered from the viewpoint of the subspace properties and does not require the computation 
of roots of characteristic polynomials. Similar to the Min-Norm method, the MUSIC method consists 
in searching for the maximums of the pseudospectrum 1/ f[uj). 

There is a modification of MUSIC called 'EV. In this modification, the pseudospectrum is con- 
structed on the base of the weighted sum 'Ylij=r+ifii'^)/^j- "^^^ -^^ method can improve frequency 
estimates if the signal rank r is detected with error, since the weights decrease the contribution of 
summands corresponding to the noise eigenvectors that are adjacent to the signal ones. Note that the 
EV method is not expressed in terms of the signal subspace. 

Version 7. Root-MUSIC This method involves calculation of the polynomial roots. Specifically, we 
calculate the roots of a polynomial of z solving the equation {Z{\/ z))^\Jr+i,L^*.j^i l^(.^) — 0- This 
polynomial can be considered as the multiplication of polynomials with signal roots connected by the 
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Table 1: The rate of convergence of recon- 
struction: estimated A 
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Table 2: The rate of convergence of pro- 
jector estimates: estimated A 
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Table 3: The rate of convergence of expon. 
base estimates: estimated A 
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Table 4: The rate of convergence of fre- 
quency estimates: estimated A 
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relation z' 



', that is, the forward and backward predictions. Then the roots with modules 



less than or equal to 1 are assembled in the ascending order of their modules. The signal roots have 
modules that are the closest to 1. 

6 The rate of convergence 

The paper [27] contains theoretical results on convergence (as the time series length A^ tends to infinity) 
for the methods that are based on the estimation of the signal subspace. Here we investigate the rate 
of convergence by means of examples. 

Let us consider two time series with lengths A^i and A'^2 such that A'^2 = 4A'^i. Let RMSE be 
the measure of accuracy. If the residual Rn is random, then we perform simulations to estimate 
RMSE. We denote the ratio of RMSEs for the window lengths iVi and iV2 by A = RMSE1/RMSE2. 
Then, A = 8 indicates the rate of convergence 1/A^^'^, A = 2 corresponds to the rate of convergence 
1/A^*''^, and A = 1 means that there is no convergence at all. To estimate A, we use A'^i = 6399 and 
N2 = 25599 (we chose odd time series lengths to consider (A^ + l)/2 as one of window lengths) 

We discuss examples with the following three kinds of perturbation of the signal: by a constant, by 
noise and by a sum of noise and a constant. Also, we consider two types of random noise, white and 
red. We perform numerical experiments for the time series (3)-(6) with a = 0.1, b = 1, a = 0.5. In 
what follows we calculate the frequency and exponential base estimates using the LS-ESPRIT method 
(the difference with the results of TLS-ESPRIT is small and does not influence the conclusions) . 

Tables 1-4 include the results on convergence based on 1000 simulations. The column 'c' corre- 
sponds to the time series (3) with constant residuals, the columns 'wn' and 'rn' contain the results 
for white-noise (the time series (4)) and red- noise (the time series (6)) residuals respectively, and the 
column 'c+wn' includes estimates of A for the time series (5) with combined perturbation. 

It would appear reasonable that the convergence rates for fixed window lengths and for window 
lengths proportional to N differ. Also, the multiplicity of window lengths to the period of the sine- wave 
signal (10 in the considered examples) can be important. Therefore, we analyze two sets of window 
lengths. The first set includes fixed window lengths: the minimal L = r + 1, where r = 2 is the rank 
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of sinusoid, L = 20 is divisible by 10, and L = 25 is a common case. The second set contains two 
window lengths close to N/2: L = [N + l)/2 is divisible by 10 and L = [N + l)/2 — 5 is a common 
case. Note that if there is the exact separability (that is possible in the case of deterministic residuals 
only), then the ratio A cannot be calculated (the sign ' — ' in the tables). We do not consider the 
windows length L > {N + l)/2, since these values of L lead to either the same or worse convergence 
rates in comparison with the window length N — L + 1 < (A^ + l)/2. 

Let us discuss the results presented in Tables 1-4 for window lengths tending to infinity and for 
fixed window lengths separately. 

Window length L ~ N/2 The simulations provide stable estimates of the convergence rate for 
the window length L equal to one-half of the time series length (and more generally, for the window 
lengths that are proportional to A^): 

(A) "signal-|- noise" (the time series (4), (6)) 

(a) the convergence rate of the projector on the signal subspace is 1/N^'^, 

(b) the convergence rate of the reconstruction of the whole signal (average error) is 1/N^'^, 

(c) the convergence rate of the frequency and exponential rate estimates is 1/A^^'^; 

(B) "signal+constant" (the time series (3)) 

(a) the convergence rate of the projector on the signal subspace is nearly 1/A^, 

(b) the convergence rate of the reconstruction of the whole signal (average error) is nearly 1/A^^'^, 

(c) the convergence rate of the frequency and exponential rate estimates is nearly 1/iV^. 

Theoretical results on the reconstruction errors in a particular case of a noisy constant signal [17], 
see (7), provide support for a part of the conclusions derived from the simulations. Results on RMSE 
for the frequency and exponential base estimates are confirmed in [2], where the case of a noisy sinusoid 
is considered. Let us remark that the convergence rate 1/A^^'^ is not surprising, since the Cramer-Rao 
lower bound for the variance of the frequency estimates has the same order (see, e.g. [29, 31]). On 
the other hand, the Cramer-Rao lower bound for the variance of estimates of the sinusoid amplitude 
has the order 1/N'^'^, which corresponds to the convergence rate of the reconstruction of the signal. 
Simulations for the time series (6) confirm that for L ~ N/2 the red-noise residuals provide the same 
convergence rate as the white-noise residuals. 

One can see that the convergence rate in the examples with pure random residuals is much worse 
than that for the example with deterministic (constant) residuals. As one might expect, the example 
with combined residuals inherits the worst case. Simulations confirm that in the case of a constant 
residual mixed with a stochastic component (the time series (5)) and the window length L proportional 
to A^, the rate of convergence is the same as for the case (A). 

Fixed window length L = Lq Let us consider the case of a fixed window length L and A^ — t- oo. 
The behavior of the rate of convergence is more complicated than the behavior described above. 
Analysis of the reconstruction errors shows the following behavior: 

(A) for "signal -|- noise" (the time series (4), (6)), there is no convergence to the signal, even for the 
window lengths divisible by the signal (or noise) periods (divisible by 10 in the considered example); 

(B) for "signal-l-constant" (the time series (3)), the convergence holds only if L (or K = N — L-\-l) 
is divisible by the period; in general, there is no convergence. 

Consequently, if residuals contain noise, there is no convergence. Thus, in general, small window 
lengths are not suitable for the problems of signal reconstruction. 

Let us now consider the errors of projector and parameter estimates. The difference with the 
behavior of reconstruction errors consists in the presence of convergence for small L and white-noise 
residuals. However, the simulations for the time series (6) with red-noise residuals demonstrate the 
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absence of convergence for projection and frequency estimation. In Tables 2 and 4, this absence of 
convergence corresponds to the values '1.0' and '1.1' in the column entitled 'rn'. Convergence of the 
exponential base estimates still takes place, see Table 3. 

The question is how the deterministic (constant, in our examples) compound of the white-noise 
residuals influences the convergence for small L. Unfortunately, the errors for the projector and 
parameter estimates converge to for L = Lq only if the window length is divisible by the signal 
periods. 

Thus, we can conclude that using small window lengths for frequency estimation is possible only 
if the residuals are pure white noise, that is, they do not contain deterministic components and are 
independent. Otherwise, there is no convergence. 

Let us compare the estimation error for white and red noise residuals regardless of the absence 
of convergence in the latter case. Fig. 18 shows the absolute values of the estimation errors for 
L = 10. The first symbols of the line titles mean types of the estimated objects: projector ('proj'), 
exponential base ('base'), or frequency ('freq'); the last symbols designate types of residuals. One 
can see that the sizes of errors for red and white noise residuals are comparable for the considered 
time series lengths. This effect is stable enough for different parameters of the time series model. 
Thus we can perform estimation with good accuracy even in the case of the absence of convergence. 
This can be explained by the results of [27, formula (2.15)], where the upper bounds for the projector 
errors are derived. The main term of the upper bound (it is a constant depending on the sinusoid 
frequency, the parameters of red noise, and the window length) yields the proper order of errors. 
Moreover, [27, formula (2.9)] provides approximately the same small magnitude 1.3- 10"'^ as presented 
in Fig. 18, the line marked 'proj_rn'. Calculations confirm that this magnitude is approximately equal 
to iir(SS ) 5] (l — Ur (Ur ) ), where S is the autocovariance L x L matrix of the considered red 
noise. Note that this term does not converge to as the time series length tends to infinity. 
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Figure 18: Comparison of RMSE for red-noise and white-noise residuals 



As we have mentioned, the DOA problems use matrices corresponding to the case of fixed L = Lq, 
since L is the number of sensors. Therefore, the estimation of frequencies by ESPRIT-like methods 
can be performed with high accuracy only if the residual is pure white noise (or if L is large enough). 
In most existing works devoted to the performance of the subspace-based methods for the DOA 
problems, L is fixed (see, for example, [21] for the root-MUSIC performance or [26] for a wide class 
of subspace-based algorithms including ESPRIT). Papers [2] and [9] are beyond the scope of DOA 
and consider the estimation of signal parameters for an arbitrary noisy signal governed by an LRF as 
min(L,i^) — t- oo. 
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7 Choice of the window length and separabihty 

7.1 Modulated sinusoid 

Note that the abihty of SSA-hke methods to extract exponentially modulated (damped) sinusoids 
implies that the SSA method is not simply a spectral method even if we do not apply SSA for trend 
extraction. Note that both damped and undamped sinusoids have SSA-rank 2 (or rank 1 if cisoids 
are considered in the complex- valued case). This feature of SSA significantly extends the set of time 
series that are suitable for the SSA analysis. Most of the classical methods (e.g. Fourier analysis, 
seasonal decomposition) deal with either constant amplitudes or with amplitudes proportional to the 
trend (if any) and can be reduced to constant amplitudes by transferring time series to the logarithmic 
scale. However, time series consisting of several damped sinusoids cannot be reduced to periodic time 
series or to multiplicative periodicity. Let us consider, for example, a seasonal component containing 
both yearly and quarterly oscillations and let the amplitude of yearly periodicity increase whereas the 
amplitude of quarterly oscillations decrease. Then many classical methods fail whereas the SSA-like 
methods can easily extract such seasonality. 

If the behavior of modulations is more complex than the exponential one, then the SSA-like 
methods can encounter difficulties. These methods can still extract such oscillations, however the 
question of the proper choice of the window length arises. 

Let us formulate the question in a more specific form. Consider the signal in the form of s„ = 
^(n) cos(27rna;), where A{n) is a slowly (in comparison with oj) varying function. The question: are 
there any examples when the choice of the window length close to A^/2 is not good. 

We consider the time series fn = Sn + Tn with 

Sn = cos(27rn/19) + cos(27rn/21), r„ = e„. (14) 

Here s.„ = j4(n) cos(27rn/20) with A{n) = 2cos(7rn(l/19— 1/21)) is a modulated sinusoid of frequency 
1/20. The signal has rank 4 and is asymptotically separable from noise, constant residual and others. 
We have two alternative possibilities. The first possibility is to take L close to A^/2 (e.g., between 
A^/3 and A^/2) and extract the signal by four leading eigentriples. The second alternative is to take 
a window length L so small that the amplitude of the signal is almost constant within the limits of 
subseries of length L, and then to extract the signal by two leading components. In the latter case, 
the left singular vectors are close to the undamped sinusoids and the modulation is caught by the 
right singular vectors. 

Numerical simulations (see Fig. 19 for the time series lengths equal to 99, 199, 399, and 999) 
show that there is no clear choice between the described alternatives. If the time series length is large 
enough for approximate separability, then the choice L ~ A^/2 is better. Otherwise, window lengths 
close to a couple of periods provide a better accuracy. The drawback of the latter choice is that usually 
we do not know the period and therefore cannot guess the proper window length. 

Thus, under the conditions of approximate separability (one can check if these conditions are met by 
means of analysis of the decomposition results, see [15]) the best signal reconstruction uses the window 
length close to A^/2 and the number of the corresponding eigentriples is equal to the signal rank. The 
advantages of this choice are the better accuracy and the independence of the window length choice 
from the unknown values of periods. In the considered example, for L = N/2, the reconstruction by 
four eigentriples performs well starting from A^ = 160. However, the choice of L = A^/2 is appropriate 
even for A^ < 120 if we produce the reconstruction by two eigentriples. Therefore, there is a limited 
range of time series lengths (approximately from 120 to 160), where the choice of the window length 
close to A^/2 cannot provide an adequate accuracy of signal reconstruction. 
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Figure 19: RMSE of reconstruction: dependence on L/N for t.s. (14), ETl-2 and ETl-4 



Table 5 contains RMSE for signal reconstruction using two window lengths: L 
and L = N/2. The values in bold indicate smaller errors. 



40 (two periods) 



Table 5: RMSE of reconstruction: different parameter choices for t.s. (14) 





L = 40 


L = N/2 


L = N/2 
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2ET 


2ET 


4ET 


99 


0.27 


0.27 


0.45 


159 


0.23 


0.42 


0.24 


199 


0.22 


0.65 


0.25 


399 


0.20 


0.70 


0.16 


999 


0.19 


0.68 


0.11 



The case of complex-form modulation Let us consider the case where the modulated signal is 
not a signal of finite rank. Let N = 399 and fn = s„, + r„, where 



A(n)cos(27rn/20), r„ 



(TEr. 



(15) 



where A{n) = cos(27rn^/10^). Figures 20 and 22 show the initial time series with different levels of 
noise and Figures 21 and 23 contain the errors of reconstruction by 2, 4, 6, and 8 leading eigentriples. 
The choice of the window length for signal reconstruction is not crucial for low levels of noise, since 
we are able to achieve a good accuracy by choosing eigentriples for reconstruction properly (the larger 
the window length L < N/2, the larger the number of eigentriples chosen for reconstruction). 

Fig. 23 shows that for a high level of noise the choice of the window length close to N/2 does not 
provide the best accuracy. In particular, using L = 200 with reconstruction by four leading eigentriples 
is slightly worse than using L = 40 (two basic periods) and two leading eigentriples. However, if we 
take, say, L = 80, then reconstruction by two leading eigentriples is less accurate. Thus, the choice of 
L ~ N/2 is quite appropriate and can be improved only with the help of additional information (like 
the value of period) about the analyzed time series. 

Note that in this subsection we considered the choice of window lengths for signal reconstruction. 
Estimation of the basic frequency (e.g., by ESPRIT) is a separate problem, which requires special 
statements and approaches to its solution. 

7.2 The problem of mixing 

One of the problems that SSA encounters is a possible lack of strong separability [Jo] under conditions 
of weak separability. This problem is caused by equal singular values in SVDs of trajectory matrices 
of the signal and the residual. 
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Figure 20: Initial time series: 
t.s. (15), (7 = 0.1 



Figure 21: RMSE of reconstruc- 
tion: t.s. (15), a = 0.1 
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Figure 22: Initial time series: 
t.s. (15), cr = 1 



Figure 23: RMSE of reconstruc- 
tion: t.s. (15), (T = 1 



Let us consider some deterministic slowly varying component (i.e., a trend) as a signal. The signal 
does not necessarily have a finite rank. More likely, it can be approximated by a time series of finite 
rank Tappr- Let Q be the share of rappr leading squared singular values in the SVD of the trajectory 
matrix of the trend. Let us denote by A^|l^ the maximal singular value generated by the trajectory 
matrix of the residual and by A^?" the rappr-th singular value of the trajectory matrix of the trend. 

The case of approximate separability corresponds to large Q under the condition A^l'^ < -^m?n'^- ^^ 
Q is fixed, then, in general, the larger the window length, the larger the rank of the approximating time 
series and the smaller is \^^^^ ■ This observation can lead to the optimal window lengths considerably 
smaller than N/2. 

We consider the example 



cos(27rnVlO^ 



e„ + 0.5cos(27rn/10). 



(16) 



N = 199 (Fig. 24). It is easy to check numerically that for L = 100 we can get rappr = 2 with 
Q = 99.5% = 92% + 7.5% (the singular values are equal to 74 and 21), whereas for L = 30, rappr = 1 
with Q = 98% (the singular value is 41). However, the residual produces the maximal singular value 



25 



equal approximately to 25 for L = 100 and to 15 for L = 30. This means that we have no strong 
separability for L = 100 with Q = 99.5%. Thus, we can reach Q = 92.5% for L = 100 and Q = 98% 
for L = 30 (certainly, these are just rough measures) to satisfy AJ^|^ < A^?°*^, that is confirmed by 
Fig. 25. Note that after extracting the trend with Li = 30 the periodicity can be extracted from the 
residual with a window length close to L2 = 100. In [ ] this technique is called Sequential SSA. 
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Figure 24: Initial time series: 
t.s. (16) 



Figure 25: RMSE of reconstruc- 
tion: t.s. (16) 



8 SSA processing of stationary time series 

There are special recommendations concerning the choice of parameters for stationary time series. 
It is traditionally recommended to perform the centering procedure for the stationary time series 
before processing (i.e., to subtract the average over the time series) and then to use the Toeplitz 
autocovariance matrix C with entries 



1 



-jj 



N 



N-\i-j\-l 

/ _, JinJm+\i—j\ 



m=0 



instead of C = XX at the decomposition stage (see [15] for details of the Toeplitz SSA algorithm). 
Let us remark that using C to obtain the SVD of the trajectory matrix is sometimes called 'BK' 
following [G], while using C to get the eigenvectors is initiated by the spectral analysis and is called 
'VG' following [36]. The use of C does not provide us with the SVD of the trajectory matrix and 
therefore with the SVD optimality. 

The papers related to the SSA analysis of climate time series (e.g. [13]) consider the Toeplitz 
SSA as the main version and state that the Basic and Toeplitz versions only slightly differ. Our 
investigation shows that Toeplitz SSA provides more stable SSA results (reconstruction, forecast, 
estimates). However, these results can be inadequate and can have a considerable bias if the time 
series we analyze is not stationary. It seems that using the Toeplitz version of the SSA algorithm is 
unsafe if the time series contains a trend or oscillations with increasing or decreasing amplitudes. 

Here we can apply a well-known principle: if the method assumes a model, then it gives more 
precise results when the model is valid; otherwise, the method can produce completely wrong results. 
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Centering the time series is a less risky procedure than using C and can either sHghtly improve 
the SSA resuhs or worsen them. Let us note that centering usuaUy increases the rank of the signal 
making the signal structure more complicated. 

Let us recall that besides SSA with centering as preprocessing there are the so called "Single 
centering SSA" [15], which is appropriate for time series with a constant trend, and also the so called 
"Double centering SSA" [L5], which works well for time series with linear trends. 

In the following subsections we demonstrate the examples of application of the centering procedure 
and the Toeplitz SSA algorithm to non-stationary time series. 

8.1 Centering as preprocessing 

Let A^ = 199 and fn = Sn + fn with 



1.005", rn = en, 



(17) 



see Fig. 26. 

The rank of the exponential series is 1 and therefore for its extraction we should choose just 
one eigentriple. After centering this time series, we obtain a new one Gm = {qq, . . . ,gN-i) with 
Qn = 1.005" — c + e„. Therefore, we should choose two eigentriples to extract 1.005" — c, i.e., we 
artificially create a more complex structure of the signal. The results of simulations (see Fig. 27, L 
changes from 5 to 100 with increment 5) confirm that the thickening of the signal structure ends in 
the increase of errors. 

Practically, the centering procedure can increase the errors of reconstruction even of undamped 
sinusoids in short time series if the time series length is not divisible by the sine-wave period. The 
explanation is similar to that for the exponential time series: if the time series length is not divisible 
by the sine-wave period, then the average over the time series is not equal to 0. Therefore, after 
subtracting this average we transform the signal of rank 2 to a signal of rank 3. For long time series 
the effect of the rank increase is diminished because the time series average is almost zero. However, 
in such a case centering has no sense. 
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Figure 26: Initial time series: t.s. (17) 



Remark 8.1 (About filling in the missing data). Let us mention that the method of imputation of 
missing values introduced in [16] does not imply centering (although centering can be used). In the 
method from [19] the centering procedure is essential, since the missing values are replaced with zeros 
at the first iteration. It seems that this peculiarity is caused by the fact that the first applications of 
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Figure 27: RMSE of reconstruction: t.s. (17), reconstruction by ETl with no centering and by ETl-2 
with the prehminary centering 

SSA have been oriented at stationary time series. Certainly, to generahze the method proposed in [19], 
we can fill in the missing values using the average over the whole time series or use an interpolation 
method based on neighboring non-missing values. 

Remark 8.2 (About eigenvalues). Eigenvalues play an important role in the analysis of stationary 
time series (see, for example, papers devoted to Monte Carlo SSA for detecting a signal in red noise 
[1]). For stationary time series, the share of eigentriples corresponding to the signal represents the 
contribution of the corresponding component. However, in the general case of arbitrary time series, 
there is little point in the eigentriples share, since it depends on absolute values of the times series: 
in Basic SSA, the distribution of eigenvalues (mostly, leading) depends on the constant compound of 
the time series while the structure of the time series should not depend on addition or substraction 
of a constant. In particular, if the leading eigentriple takes 99.9%, this does not mean that it is 
enough to take only one eigentriple to approximate the time series with high accuracy. Certainly, 
we can use information on eigenvalues to identify pairs of eigentriples generated by sinusoids (one 
sinusoid produces two close or equal eigenvalues). However, the same identification can be done by 
more powerful tools. 

8.2 Toeplitz SSA 

Let us demonstrate the consequences of improper use of Toeplitz SSA. First, application of Toeplitz 
SSA to non-stationary signals of finite rank r generally increases the number of nonzero eigenvalues 
from r to the maximal possible value equal to min(L, K), that is, the structure can be lost. This means 
that the number of eigentriples required for accurate reconstruction increases. Also, the constructed 
approximation can have a wrong structure and, for example, lead to a wrong forecast. 

We consider two examples of time series of finite rank, /„ = 1.005" cos (27rn/20) (rank 2) and 
/„ = 1.005" (rank 1). To get an accurate approximation of the last time series points, we perform 
Toeplitz-SSA reconstruction with L = 200 based on ETl-14 (ETl-2 for the first example and ETl for 
the second one do not provide good accuracy). Figures 28 and 29 demonstrate that the forecast based 
on the chosen components is inadequate (moreover, this conclusion does not depend on the chosen 
window length L and number of components). 

One can see that the Toeplitz-SSA forecast is completely wrong while the Basic SSA forecast of 
finite-rank time series is precise. 
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Figure 28: Toeplitz SSA forecast for /„ = 
1.005" cos(27rn/20), N = 399, L = 200, 
ETl-14 



Figure 29: Toeplitz SSA forecast for /„ 
1.005", N = 399, L = 200, ETl-14 



9 SVD-origins of SSA and the choice of SSA parameters 

The key step in SSA is the Singular Value Decomposition (SVD) of the trajectory matrix. The SVD 
is used for solving different problems, including statistical methods in data analysis. Therefore, the 
logic of these procedures can also be extended to SSA. Let us briefly describe several origins of SVD- 
related ideas and views on the SSA-parameter choice generated by them. We do not aim to review 
the literature on SVD but instead just emphasize the relation between origins and the methodology 
of SSA. 

Principal Component Analysis (PCA) This origin is characterized by different kinds of manip- 
ulations with variables and with cases in data, i.e., with rows and columns of the trajectory matrix. 
In particular, the conventional manipulations are centering and standardization of variables. This 
induces different views on eigenvectors and factor vectors (vectors of principal components), where 
the latter is interpreted as components of the original time series (maybe, just a little shorter than 
the original time series). In addition, greater attention to eigenvalues' contribution is transferred from 
PCA to SSA. This attitude to the trajectory matrix is appropriate when we apply Single centering 
SSA and when the number of rows (L) is fixed and is smaller than the number of columns {K). In 
the general case, the structure of the trajectory matrix does not depend on the transposition of the 
trajectory matrix, the interpretation of eigenvalues is not so important and, in particular. Double 
centering is often more natural than Single one. 

Hankel rank-deficient matrices The relation between such matrices and time series governed by 
linear recurrent formulas has long been known (see e.g. [12]). This technique allows us to analyze noisy 
time series governed by LRFs. The main application of this idea refers to the signal processing with 
its approaches to the choice of the method's parameters and consists in the analysis of a noisy sum 
of damped/undamped cisoids and the estimation of their parameters (mostly, frequencies). However, 
this approach can be applied to parameter estimation of arbitrary signals governed by LRFs. 

Spectral analysis [14] The name of this approach is close to Singular Spectrum Analysis. However, 
the name 'Spectral analysis' means the analysis of stationary time series and their frequency charac- 
teristics. On the other hand, Singular Spectrum is related to the spectrum of linear operators, i.e. 
singular values of trajectory matrices in the case of SSA. Thus, the analysis of Singular Spectrum does 
not imply stationarity. As has been stated in [10], the name SSA does not reflect the multifaceted 
entity of SSA and is traditionally used. Peculiarity of this origin consists in centering the time series 
before processing and subsequent application of the Toeplitz version of SSA. As was demonstrated in 
Section 8, if we apply this technique to non-stationary time series, it is likely that we obtain either 
imprecise or even meaningless results. Within the framework of Spectral analysis approach, special 
attention is paid to the red noise (autoregression of order 1) and testing the null hypothesis of the 
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absence of a signal in red noise. In some sense, this corresponds to the problem with weak signal and 
strong noise (in contrast to the previous origin). 

Karhunen-Loeve expansion [4] This expansion is conventionally used in the theory of stochastic 
processes and originally assumes zero expectation of elements of the considered stochastic process (or 
subtracting the known averages). In [4], estimation of the process average is performed by using the 
moving average. Then the errors from this estimation are included in the centered stochastic process 
and this allows one to apply the method to processes with trends. Thus, the algorithm appears to be 
very close to PCA and in fact coincides with Single centering SSA. It seems that this origin is used by 
researchers who are well familiar with the stochastic process techniques and therefore the association 
with KL expansion helps them to understand the SSA method. 

Dynamical systems [6, 11] This origin is related to special problems in the theory of dynamical 
systems with a specific approach to the choice of parameters. However, the contribution of these papers 
is considerable, since the described algorithm served as an origin of SSA ideas in several applied areas. 

10 Conclusion 

In this paper we have considered the SSA-related methods from a unified point of view. The approach 
to the investigation of these methods was formulated for the 'signal + residual' time series. This 
enabled us to show the similarity and the specifics of the problems which can be solved by the 
considered methods. 

The accuracy of the methods and the choice of parameters were studied. In the paper we em- 
phasized on the internal mechanism of error origins. The understanding of this internal mechanism 
together with computer simulations for several typical examples enabled us to formulate the recom- 
mendations on the optimal choice of the window length L, which is the main parameter in SSA. In 
particular, the choice of L close to one-half of the time series length was approved as appropriate in 
most cases. Classes of time series for which this choice should be corrected were indicated. 
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